Measuring k-Wise Independence of Streaming Data under L2 Norm

نویسندگان

  • Vladimir Braverman
  • Rafail Ostrovsky
چکیده

Measuring independence and k-wise independence is a fundamental problem that has multiple applications and it has been the subject of intensive research during the last decade (see, among others, the recent work of Batu, Fortnow, Fischer, Kumar, Rubinfeld and White [11] and of Alon, Andoni, Kaufman, Matulef, Rubinfeld and Xie [2] ). In the streaming environment, this problem was first addressed by Indyk and McGregor [44]. In this model the joint distribution is given empirically by a stream of elements, and the goal is to measure the distance between joint and product distribution under L distance. Indyk and McGregor give elegant solutions for estimating pairwise independence under L1 and L2 norms. The question of estimating k-wise independence on a stream of tuples, instead of pairs, is of central importance in multiple applications, where data typically comes with multiple attributes, such as database entries, minute-to minute changes in stock prices in a financial portfolio, and so on. Indyk and McGregor state, as an explicit open question in their paper, the problem of whether one can estimate k-wise independence on k-tuples for any k > 2. In this paper we answer the aforementioned open question of Indyk and McGregor [44] affirmatively for the L2 norm for any constant k. Our solution gives an (ǫ, δ)-approximation with O( 1 ǫ log 1 δ (logn+ logm)) memory bits, uses a single pass over the data and tolerates deletions. The main technical contribution of our paper is a novel combinatorial approach to analyzing second moment (i.e., variance) of dependent sketches. We believe that our method will be of independent interest. In our recent paper [15] we address the problem of measuring pairwise and k-wise independence under L1 norm. In [15] we use different methods which are not applicable to L2 norm.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

AMS Without 4-Wise Independence on Product Domains

In their fundamental work, Alon, Matias and Szegedy [3] presented celebrated sketching techniques and showed that 4-wise independence is sufficient to obtain good approximations. The question of what random functions are necessary is fundamental for streaming algorithms (see, e.g., Cormode and Muthukrishnan [9].) We present a somewhat surprising fact: on product domain [n], the 4-wise independe...

متن کامل

A New Method for Ranking Extreme Efficient DMUs Based on Changing the Reference Set with Using L2 - Norm

The purpose of this study is to utilize a new method for ranking extreme efficient decision making units (DMUs) based upon the omission of these efficient DMUs from reference set of inefficient and non-extreme efficient DMUs in data envelopment analysis (DEA) models with constant and variable returns to scale. In this method, an L2- norm is used and it is believed that it doesn't have any e...

متن کامل

Speaker adaptation based on regularized speaker-dependent eigenphone matrix estimation

Eigenphone-based speaker adaptation outperforms conventional maximum likelihood linear regression (MLLR) and eigenvoice methods when there is sufficient adaptation data. However, it suffers from severe over-fitting when only a few seconds of adaptation data are provided. In this paper, various regularization methods are investigated to obtain a more robust speaker-dependent eigenphone matrix es...

متن کامل

Lipschitzian Stability for State Constrained Nonlinear Optimal Control∗

For a nonlinear optimal control problem with state constraints, we give conditions under which the optimal control depends Lipschitz continuously in the L2 norm on a parameter. These conditions involve smoothness of the problem data, uniform independence of active constraint gradients, and a coercivity condition for the integral functional. Under these same conditions, we obtain a new nonoptima...

متن کامل

Lecture 2 The Chernoff bound and median-of-means amplification

In the previous lecture we introduced a streaming algorithm for estimating the frequency moment F2. The algorithm used very little space, logarithmic in the length of the stream, provided we could store a random function h : {1, . . . , n} → {±1} “for free” in memory. In general this would require n bits of memory, one for each value of the function. However, observe that our analysis used the ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009